Competitive Classification and Closeness Testing
نویسندگان
چکیده
We study the problems of classification and closeness testing. A classifier associates a test sequence with the one of two training sequences that was generated by the same distribution. A closeness test determines whether two sequences were generated by the same or by different distributions. For both problems all natural algorithms are symmetric—they make the same decision under all symbol relabelings. With no assumptions on the distributions’ support size or relative distance, we construct a classifier and closeness test that require at most Õ(n) samples to attain the n-sample accuracy of the best symmetric classifier or closeness test designed with knowledge of the underlying distributions. Both algorithms run in time linear in the number of samples. Conversely we also show that for any classifier or closeness test, there are distributions that require Ω̃(n) samples to achieve the n-sample accuracy of the best symmetric algorithm that knows the underlying distributions.
منابع مشابه
Pitman-Closeness of Preliminary Test and Some Classical Estimators Based on Records from Two-Parameter Exponential Distribution
In this paper, we study the performance of estimators of parametersof two-parameter exponential distribution based on upper records. The generalized likelihood ratio (GLR) test was used to generate preliminary test estimator (PTE) for both parameters. We have compared the proposed estimator with maximum likelihood (ML) and unbiased estimators (UE) under mean-squared error (MSE) and Pitman me...
متن کاملCompetitive Closeness Testing
We test whether two sequences are generated by the same distribution or by two different ones. Unlike previous work, we make no assumptions on the distributions’ support size. Additionally, we compare our performance to that of the best possible test. We describe an efficiently-computable algorithm based on pattern maximum likelihood that is near optimal whenever the best possible error probabi...
متن کاملNegative Selection Based Data Classification with Flexible Boundaries
One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...
متن کاملOptimizing Cost Function in Imperialist Competitive Algorithm for Path Coverage Problem in Software Testing
Search-based optimization methods have been used for software engineering activities such as software testing. In the field of software testing, search-based test data generation refers to application of meta-heuristic optimization methods to generate test data that cover the code space of a program. Automatic test data generation that can cover all the paths of software is known as a major cha...
متن کاملAn Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کامل